IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System
نویسندگان
چکیده
In this paper, we describe the system jointly developed by IBM Research and Columbia University for video copy detection and multimedia event detection applied to the TRECVID-2010 video retrieval benchmark. A. Content-Based Copy Detection: The focus of our copy detection system this year was fusing three types of complementary fingerprints: a keyframe-based color correlogram, SIFTogram (bag of visual words), and a GIST-based fingerprint. However, in our official submissions, we did not use the color correlogram component since our best results on the training set came from the GIST and SIFTogram components. A summary of our runs is listed below: 1. IBM.m.nofa.gistG: A run based on the grayscale GIST frame-level feature, with at most 1 result per query, except in the case of ties. 2. IBM.m.balanced.gistG: As in the above run, but with including more results per query, though on average still less than 2. 3. IBM.m.nofa.gistGC: The result of the nofa.gistG run, fused with results from GIST features extracted from the R,G,B color channels. 4. IBM.m.nofa.gistGCsift: The result of the nofa.gistGC run, fused with a SIFTogram result. Overall, the grayscale GIST approach performed best. We found it produced excellent results when tested on the ∗IBM T. J. Watson Research Center, Hawthorne, NY, USA †Dept. of Computer Science, Columbia University ‡College of Computing, Georgia Tech §Dept. of Electrical Engineering, Duke University TRECVID-2009 data set, with an optimal NDCR that surpassed what we had achieved with SIFTogram previously. The “gistG” runs also outperformed our other runs on the 2010 data, although we changed the SIFT implementation we used this year which made it not directly comparable with our previous TRECVID results. Our system did not make use of any audio features. B. Multimedia Event Detection: Our MED system has three aspects to its design – a variety of global, local, and spatial-temporal descriptors; building detectors from a large-scale semantic basis, and designing temporal motif features: 1. IBM-CU 2010 MED EVAL cComboAll 1 : Combination of all classifiers. 2. IBM-CU 2010 MED EVAL pComboIBM+CUHOF 1 : Combination of global image features, spatial-temporal interest points, audio features, and model vector classifiers. 3. IBM-CU 2010 MED EVAL cComboStatic 1 : Combination of global image features, and model vector classifiers. 4. IBM-CU 2010 MED EVAL cComboDynamic 1 : Combination of spatial-temporal interest points, audio features, temporal motif, and HMM classifiers. 5. IBM-CU 2010 MED EVAL cComboIBM+CUHOF 2 :Combination of global image features, spatial-temporal interest points, audio features, and model vector classifiers. 6. IBM-CU 2010 MED EVAL cComboIBM-HOF 1 : Combination of global image features, spatialtemporal HOG points, and model vector classifiers. 7. IBM-CU 2010 MED EVAL cComboIBM 1 : Combination of global image features, spatialtemporal interest points, and model vector classifiers. 8. IBM-CU 2010 MED EVAL cmodelVectorAvg 1 : Run with 272 semantic model vector features. 9. IBM-CU 2010 MED EVAL cTemporalMotifs 1 : Semantic model vector feature with sequential motifs. 10. IBM-CU 2010 MED EVAL cmvxhmm 1 : Semantic model vector feature with hierarchical HMM state histograms. Overall, the semantic model vector is our best-performing single feature, while the combination of dynamic features outperforms the static features, and temporal motif and hierarchical HMMs show promising performance.
منابع مشابه
INRIA @ TRECVID ’ 2011 : Copy Detection & Multimedia Event Detection
In this paper we present the results of our participation to the Trecvid tasks Copy Detection and Multimedia Event Detection. It focus, in particular, on the comparison of systems for the CCD task, by analyzing the importance of 1) the audio module, 2) the video module and of 3) the fusion module.
متن کاملTRECVID ’ 2011 : Copy Detection & Multimedia Event Detection
In this paper we present the results of our participation to the Trecvid tasks Copy Detection and Multimedia Event Detection. It focus, in particular, on the comparison of systems for the CCD task, by analyzing the importance of 1) the audio module, 2) the video module and of 3) the fusion module.
متن کاملIBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System
The IBM Research/Columbia team investigated a novel range of low-level and high-level features and their combination for the TRECVID Multimedia Event Detection (MED) task. We submitted four runs exploring various methods of extraction, modeling and fusing of low-level features and hundreds of high-level semantic concepts. Our Run 1 developed event detection models utilizing Support Vector Machi...
متن کاملNTNU-Academia Sinica at TRECVID 2010 Content Based Copy Detection
This paper presents two video copy detection systems built for the TRECVID 2010 content-based copy detection task. Three runs were submitted using video-only content. Two systems differ in terms of the feature design as well as the matching scheme. In this paper we overview the underlying methodologies and discuss the various design choices for developing a practical video copy detection system.
متن کاملIBM Research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance Event Detection (SED), and Semantic Indexing (SIN) Systems
For this year’s TRECVID Multimedia Event Detection task [11], our team studied a semantic approach to video retrieval. We constructed a faceted taxonomy of 1313 visual concepts (including attributes and dynamic action concepts) and 85 audio concepts. Event search was performed via keyword search with a human user in-the-loop. Our submitted runs included PreSpecified and Ad-Hoc event collections...
متن کامل